Prototype Extraction and Adaptive OCR

نویسندگان

  • Yihong Xu
  • George Nagy
چکیده

ÐTo maintain OCR accuracy with decreasing quality of page image composition, production, and digitization, it is essential to tune the system to each document. We propose a prototype extraction method for document-specific OCR systems. The method automatically generates training samples from unsegmented text images and the corresponding transcripts. It is tolerant of transcription errors, so a transcript produced automatically by an imperfect omnifont OCR system can be used. The method is based on new algorithms for estimating character widths, character locations in a word, and match/nonmatch probabilities from unsegmented text. An experimental word recognition system is designed and developed to combine prototype extraction algorithms and segmentation-free word recognition. The system can adapt itself to different page images and achieve high recognition accuracy on heavily degraded print. Index TermsÐOptical character recognition, adaptive classification, template matching, segmentation, document image analysis, text

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive traffic road sign panels text extraction

In this paper we present an approach to the detection and extraction of text in road sign panels. Text strings, indicators and signs extraction is efficiently performed so OCR algorithms can recognize different characters that may be present on the traffic plane. In a first step, basic color segmentation and shape classification is done for the purpose of detecting possible rectangular planes. ...

متن کامل

Extraction , Enhancement and OCR

In this paper we address the problem of text extraction, enhancement and recognition in digital video. Compared with optical character recognition (OCR) from document images, text extraction and recognition in digital video presents several new challenges. First, the text in video is often embedded in complex backgrounds, making text extraction and separation diicult. Second, image data contain...

متن کامل

Adaptive Neuro Fuzzy Inference System based Optical Character Recognition

Optical character recognition (OCR) is becoming a powerful tool in the field of Character Recognition, now a days. In the existing globalized environment, OCR can play a vital role in different application fields. Basically, OCR technique converts images into editable format. This technique converts images in the form of documents such as we can edit, modify and store data more safely for long ...

متن کامل

Generalization of Hindi OCR Using Adaptive Segmentation and Font Files

In this chapter, we describe an adaptive Indic OCR system implemented as part of a rapidly retargetable language tool effort and extend work found in [20, 2]. The system includes script identification, character segmentation, training sample creation, and character recognition. For script identification, Hindi words are identified in bilingual or multilingual document images using features of t...

متن کامل

Identification and Robust Fault Detection of Industrial Gas Turbine Prototype Using LLNF Model

In this study, detection and identification of common faults in industrial gas turbines is investigated. We propose a model-based robust fault detection(FD) method based on multiple models. For residual generation a bank of Local Linear Neuro-Fuzzy (LLNF) models is used. Moreover, in fault detection step, a passive approach based on adaptive threshold is employed. To achieve this purpose, the a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Pattern Anal. Mach. Intell.

دوره 21  شماره 

صفحات  -

تاریخ انتشار 1999